Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slim down Element, various dtype-related improvements and changes #256

Merged
merged 21 commits into from
Jan 15, 2022

Conversation

aldanor
Copy link
Contributor

@aldanor aldanor commented Jan 9, 2022

@adamreichold Here comes, as promised 😄

This is the preliminary PR to enable further work on #254.

Although there's some breaking changes in here, it's probably better to do it all in one batch because many of these changes are inter-related. I believe all changes are improvements, in one way or another; and there's also some fixes.

An unsorted list of changes:

  • Replace Element::DATA_TYPE with Element::IS_POD
    • Otherwise, we would run into serious complications when trying to implement record types. Also, this is much simpler since all we need really is a quick way to check whether a type is pod or not.
  • Remove Element::same_type()
    • This was based on typenums and wouldn't work with more complex types. Now it uses PyArray_EquivTypes.
  • Use PyArray_NewFromDescr instead of PyArray_New to create arrays.
    • This detaches it from typenums and will allow using custom descriptors later on.
  • Added a FIXME note in FromPyObject::extract() - it currently doesn't check the instance type and only verifies that dtype is 'O' which may and will lead to unsafe behaviour and so it has to be fixed.
  • Split a weird ShapeError into DimensionalityError and TypeError. The latter is no longer typenum-based either and would work with any dtypes; formatting is left for numpy to handle.
  • Add methods to PyArrayDescr:
    • into_dtype_ptr() - an alternative to as_dtype_ptr() that increfs it; useful since numpy API often steals descriptor references
    • of<T>() - an equivalent to pybind11::dtype::of<T>()
    • is_equiv_to() - to check if descriptor types are equivalent (PyArray_EquivTypes)
    • get_typenum() is made public since DataType::from_typenum() is public; doesn't make much sense to hide it
    • object() - a shortcut for creating 'O' dtype (useful in user implementations of Element)
  • Add get_typenum() to DataType
  • Fix how integer types are mapped to npy types and added a test for it (which was previously failing):
    • As an example, DataType::Uint64 could be previously mapped to np.c_ulonglong instead of np.uint64 which is not the same thing. Now, u64 always maps to np.uint64.
    • Now it uses the same logic as numpy itself (and same as in pybind11)
    • Reverse conversions (npy -> DataType) have also been reimplemented and cleaned up
  • Implemented Element for isize

Open question: one thing that I find extremely confusing is that DataType::Complex32 maps to np.complex64 (and Complex64 maps to np.complex128). Wouldn't it make more sense if they were named consistently? (i.e. same as in numpy, 64/128)

(if this is accepted, I can also work on updating the changelog if needed so.)

@aldanor
Copy link
Contributor Author

aldanor commented Jan 9, 2022

One thing we've discussed in #254 and I'd like to point out here again:

IS_POD is actually somewhat redundant as well because there's dtype.hasobject flag (so the check itself may be implemented similarly to _may_have_objects()). The only question is whether we'd like to sacrifice a few nanoseconds to slim down Element to just a single get_dtype().

Copy link
Member

@kngwyu kngwyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks nice, though I haven't checked the error types yet

@adamreichold
Copy link
Member

Open question: one thing that I find extremely confusing is that DataType::Complex32 maps to np.complex64 (and Complex64 maps to np.complex128). Wouldn't it make more sense if they were named consistently? (i.e. same as in numpy, 64/128)

I guess this would be best resolved by dropping DataType entirely?

@aldanor
Copy link
Contributor Author

aldanor commented Jan 10, 2022

Open question: one thing that I find extremely confusing is that DataType::Complex32 maps to np.complex64 (and Complex64 maps to np.complex128). Wouldn't it make more sense if they were named consistently? (i.e. same as in numpy, 64/128)

I guess this would be best resolved by dropping DataType entirely?

That's a separate question, although a good point. I'll take a look if we can drop it now.

What about type aliases?

c32 <--> complex64
c64 <--> complex128

@adamreichold
Copy link
Member

What about type aliases?

I think we should leave them, they are consistent with the nomenclature of the num-complex crate which is what Rust code would use to actually work with complex numbers. Both perspectives, i.e. size of the aggregate and size of the components, seem understandable to me, and hence this feels like unnecessary breakage that just moves the confusion elsewhere but does not remove it.

@aldanor
Copy link
Contributor Author

aldanor commented Jan 10, 2022

What about type aliases?

I think we should leave them, they are consistent with the nomenclature of the num-complex crate which is what Rust code would use to actually work with complex numbers. Both perspectives, i.e. size of the aggregate and size of the components, seem understandable to me, and hence this feels like unnecessary breakage that just moves the confusion elsewhere but does not remove it.

Maybe, at the very least then, it's worth mentioning it in the docstrings of type aliases? (that c32 is np.complex64 etc).

@aldanor aldanor force-pushed the feature/slim-element branch from 6073114 to da7685d Compare January 10, 2022 08:56
@adamreichold
Copy link
Member

adamreichold commented Jan 10, 2022

Maybe, at the very least then, it's worth mentioning it in the docstrings of type aliases? (that c32 is np.complex64 etc).

Actually, I have considered this further, we should drop them outright and just use Complex32/64. Is not significantly more typing and it makes it clear that this refers to the types from num-complex and not some bespoke binding type.

The doc comment would make sense though, but I would then attach it to the trait impl for these two types (as the type aliases are no more).

@aldanor aldanor force-pushed the feature/slim-element branch 3 times, most recently from dff5812 to 072cd43 Compare January 10, 2022 09:34
@kngwyu
Copy link
Member

kngwyu commented Jan 10, 2022

I'm also OK to remove c32/c64 aliases

Copy link
Member

@kngwyu kngwyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @aldanor and @adamreichold, I like the way this PR simplified things

@aldanor
Copy link
Contributor Author

aldanor commented Jan 10, 2022

I'm also OK to remove c32/c64 aliases

Already done (Complex32 and Complex64 re-exports remain though; with extra docstring in their Element impl mentioning that they map to np.complex64 and np.complex128 respectively, to avoid confusion).

@aldanor
Copy link
Contributor Author

aldanor commented Jan 10, 2022

@adamreichold @kngwyu I'm almost done with dropping DataType entirely as suggested by @adamreichold - this simplifies things even further which looks good.

One (cosmetic) question though – do we want a top-level dtype::<T> or dtype_of::<T>() function perhaps? That would be a lot more obvious to newcomers than having to dig into documentation for PyArrayDescr (I bet most sane people who use numpy haven't ever heard the 'array descr' term, but everyone knows what dtype is, so having a function at crate level might be nice and friendly).

This came up from fixing multiple doctests that used to look like this:

assert_eq!(dtype.get_datatype().unwrap(), numpy::DataType::Int32);

The current version is

assert_eq!(dtype.is_equiv_to(numpy::PyArrayDescr::of::<i32>(py)));

My suggestion is to also add a top-level alias, like

assert_eq!(dtype.is_equiv_to(numpy::dtype::<i32>(py)));
// or
assert_eq!(dtype.is_equiv_to(numpy::dtype_of::<i32>(py)));

@adamreichold
Copy link
Member

do we want a top-level dtype:: or dtype_of::() function perhaps?

Sounds reasonable. I would go for dtype without _of and would like if it is defined right next to PyArrayDescr with a top-level re-export to keep things close together.

@adamreichold
Copy link
Member

(You can check the extensions by running cargo check/clippy --workspace locally.)

@aldanor aldanor force-pushed the feature/slim-element branch from aa6ec7a to 0c678d9 Compare January 10, 2022 12:33
@aldanor
Copy link
Contributor Author

aldanor commented Jan 10, 2022

(You can check the extensions by running cargo check/clippy --workspace locally.)

Yea, totally forgot about the --workspace. That's fixed now. (re: clippy - I've left a comment on #239)

@aldanor aldanor force-pushed the feature/slim-element branch from 91e7bda to 24abc17 Compare January 10, 2022 12:55
@aldanor
Copy link
Contributor Author

aldanor commented Jan 10, 2022

I think all of the questions, suggestions and concerns raised so far have been addressed and it should be good to go.

Should the unreleased section of the changelog be updated as well?

@adamreichold
Copy link
Member

Should the unreleased section of the changelog be updated as well?

Technically, this is breaking and hence should be mentioned. However, the only way a user of this crate could use Element until know was for trivial wrappers of primitive types, so I don't think this is actually used anywhere.

@aldanor aldanor force-pushed the feature/slim-element branch from 9d9e0b7 to 4021797 Compare January 10, 2022 13:28
@aldanor
Copy link
Contributor Author

aldanor commented Jan 10, 2022

Rebased on the latest master and updated the changelog (there's actually quite a few various public-facing changes, not just the Element change itself).

@adamreichold
Copy link
Member

Is anyone still planning to review this but hasn't had the time so far? If not, I would like merge this on the weekend as I think it is a definite improvement by simplification and has two approvals already.

@adamreichold adamreichold merged commit af698f7 into PyO3:main Jan 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants